Overview
Brought to you by YData
Dataset statistics
| Number of variables | 29 |
|---|---|
| Number of observations | 9480 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 10 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 2.6 MiB |
| Average record size in memory | 286.0 B |
Variable types
| Text | 1 |
|---|---|
| DateTime | 5 |
| Numeric | 8 |
| Categorical | 14 |
| Boolean | 1 |
| Dataset has 10 (0.1%) duplicate rows | Duplicates |
brand_encoded is highly overall correlated with helpful_missing_flag and 2 other fields | High correlation |
category_group_encoded is highly overall correlated with purchase_missing_flag | High correlation |
fake_review_label is highly overall correlated with username_dup_flag | High correlation |
helpful_missing_flag is highly overall correlated with brand_encoded and 1 other fields | High correlation |
is_short is highly overall correlated with repetition_score | High correlation |
log_helpful is highly overall correlated with no_helpful_votes_flag and 1 other fields | High correlation |
multi_review_same_product_flag is highly overall correlated with username_dup_flag | High correlation |
no_helpful_votes_flag is highly overall correlated with log_helpful | High correlation |
product_name_match_flag is highly overall correlated with semantic_mismatch_score | High correlation |
purchase_encoded is highly overall correlated with purchase_missing_flag | High correlation |
purchase_missing_flag is highly overall correlated with brand_encoded and 2 other fields | High correlation |
recommend_encoded is highly overall correlated with recommend_missing_flag and 1 other fields | High correlation |
recommend_missing_flag is highly overall correlated with recommend_encoded | High correlation |
repetition_score is highly overall correlated with is_short and 2 other fields | High correlation |
review_length is highly overall correlated with repetition_score and 1 other fields | High correlation |
reviews.numHelpful is highly overall correlated with log_helpful | High correlation |
reviews.rating is highly overall correlated with recommend_encoded | High correlation |
semantic_mismatch_score is highly overall correlated with helpful_missing_flag and 2 other fields | High correlation |
text_length is highly overall correlated with repetition_score and 1 other fields | High correlation |
unrelated_product_flag is highly overall correlated with brand_encoded and 1 other fields | High correlation |
username_dup_flag is highly overall correlated with fake_review_label and 1 other fields | High correlation |
recommend_missing_flag is highly imbalanced (80.3%) | Imbalance |
recommend_encoded is highly imbalanced (71.5%) | Imbalance |
no_helpful_votes_flag is highly imbalanced (73.1%) | Imbalance |
is_short is highly imbalanced (75.0%) | Imbalance |
username_dup_flag is highly imbalanced (58.7%) | Imbalance |
multi_review_same_day_flag is highly imbalanced (96.6%) | Imbalance |
multi_review_same_product_flag is highly imbalanced (79.2%) | Imbalance |
fake_review_label is highly imbalanced (62.8%) | Imbalance |
reviews.numHelpful is highly skewed (γ1 = 31.6344377) | Skewed |
reviews.numHelpful has 9045 (95.4%) zeros | Zeros |
log_helpful has 9045 (95.4%) zeros | Zeros |
sentiment_polarity has 562 (5.9%) zeros | Zeros |
category_group_encoded has 375 (4.0%) zeros | Zeros |
Reproduction
| Analysis started | 2025-10-02 19:04:51.905999 |
|---|---|
| Analysis finished | 2025-10-02 19:05:04.421797 |
| Duration | 12.52 seconds |
| Software version | ydata-profiling vv4.17.0 |
| Download configuration | config.json |
Variables
id
Text
| Distinct | 77 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 638.9 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 20 |
| Mean length | 20 |
| Min length | 20 |
Unique
| Unique | 18 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | AV13O1A8GV-KLJ3akUyj |
|---|---|
| 2nd row | AV14LG0R-jtxr-f38QfS |
| 3rd row | AV14LG0R-jtxr-f38QfS |
| 4th row | AV16khLE-jtxr-f38VFn |
| 5th row | AV16khLE-jtxr-f38VFn |
| Value | Count | Frequency (%) |
| avpf3vofilapnd_xjpun | 3264 | |
| avpf0eb2ljejml43evst | 847 | 8.9% |
| avpe41tqilapnd_xqh3d | 757 | 8.0% |
| avpf2tw1ilapnd_xjflc | 669 | 7.1% |
| avpe59io1cnluz0-zgdu | 668 | 7.0% |
| av1l8zrzvkc47qavhnav | 644 | 6.8% |
| avpe8gsiljejml43y6ed | 367 | 3.9% |
| av1ygdqsgv-klj3adc-o | 333 | 3.5% |
| avpe31o71cnluz0-yrsd | 245 | 2.6% |
| avpe9w4d1cnluz0-avf0 | 213 | 2.2% |
| Other values (67) | 1473 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 15869 | 8.4% |
| V | 14790 | 7.8% |
| p | 11423 | 6.0% |
| n | 10494 | 5.5% |
| f | 9667 | 5.1% |
| l | 8039 | 4.2% |
| 3 | 6743 | 3.6% |
| D | 6394 | 3.4% |
| i | 5841 | 3.1% |
| e | 5537 | 2.9% |
| Other values (54) | 94803 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 189600 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| A | 15869 | 8.4% |
| V | 14790 | 7.8% |
| p | 11423 | 6.0% |
| n | 10494 | 5.5% |
| f | 9667 | 5.1% |
| l | 8039 | 4.2% |
| 3 | 6743 | 3.6% |
| D | 6394 | 3.4% |
| i | 5841 | 3.1% |
| e | 5537 | 2.9% |
| Other values (54) | 94803 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 189600 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| A | 15869 | 8.4% |
| V | 14790 | 7.8% |
| p | 11423 | 6.0% |
| n | 10494 | 5.5% |
| f | 9667 | 5.1% |
| l | 8039 | 4.2% |
| 3 | 6743 | 3.6% |
| D | 6394 | 3.4% |
| i | 5841 | 3.1% |
| e | 5537 | 2.9% |
| Other values (54) | 94803 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 189600 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| A | 15869 | 8.4% |
| V | 14790 | 7.8% |
| p | 11423 | 6.0% |
| n | 10494 | 5.5% |
| f | 9667 | 5.1% |
| l | 8039 | 4.2% |
| 3 | 6743 | 3.6% |
| D | 6394 | 3.4% |
| i | 5841 | 3.1% |
| e | 5537 | 2.9% |
| Other values (54) | 94803 |
dateAdded
Date
| Distinct | 77 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 74.2 KiB |
| Minimum | 2014-02-18 02:01:47+00:00 |
|---|---|
| Maximum | 2017-07-26 23:26:15+00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
dateUpdated
Date
| Distinct | 69 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 74.2 KiB |
| Minimum | 2018-01-30 06:08:52+00:00 |
|---|---|
| Maximum | 2018-02-05 11:30:15+00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
reviews.date
Date
| Distinct | 4041 |
|---|---|
| Distinct (%) | 42.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 74.2 KiB |
| Minimum | 2007-08-07 00:00:00+00:00 |
|---|---|
| Maximum | 2018-01-10 19:56:37+00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
reviews.dateAdded
Date
| Distinct | 985 |
|---|---|
| Distinct (%) | 10.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 74.2 KiB |
| Minimum | 2017-03-10 06:55:39+00:00 |
|---|---|
| Maximum | 2018-02-05 10:21:42+00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
reviews.dateSeen
Date
| Distinct | 342 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 74.2 KiB |
| Minimum | 2017-03-09 02:48:00+00:00 |
|---|---|
| Maximum | 2018-01-26 05:42:00+00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
reviews.numHelpful
Real number (ℝ)
High correlation Skewed Zeros
| Distinct | 38 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20875527 |
| Minimum | 0 |
|---|---|
| Maximum | 141 |
| Zeros | 9045 |
| Zeros (%) | 95.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 141 |
| Range | 141 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.5020695 |
|---|---|
| Coefficient of variation (CV) | 11.985659 |
| Kurtosis | 1397.2197 |
| Mean | 0.20875527 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 31.634438 |
| Sum | 1979 |
| Variance | 6.2603517 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 9045 | |
| 1 | 232 | 2.4% |
| 2 | 54 | 0.6% |
| 3 | 43 | 0.5% |
| 4 | 24 | 0.3% |
| 6 | 13 | 0.1% |
| 7 | 10 | 0.1% |
| 5 | 9 | 0.1% |
| 8 | 5 | 0.1% |
| 12 | 4 | < 0.1% |
| Other values (28) | 41 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9045 | |
| 1 | 232 | 2.4% |
| 2 | 54 | 0.6% |
| 3 | 43 | 0.5% |
| 4 | 24 | 0.3% |
| 5 | 9 | 0.1% |
| 6 | 13 | 0.1% |
| 7 | 10 | 0.1% |
| 8 | 5 | 0.1% |
| 9 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 141 | 1 | |
| 96 | 1 | |
| 56 | 1 | |
| 52 | 1 | |
| 47 | 1 | |
| 46 | 1 | |
| 45 | 1 | |
| 41 | 1 | |
| 39 | 1 | |
| 35 | 1 |
reviews.rating
Categorical
High correlation
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 5 | |
|---|---|
| 4 | |
| 3 | 425 |
| 1 | 227 |
| 2 | 130 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 5 |
|---|---|
| 2nd row | 5 |
| 3rd row | 5 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 5 | 6001 | |
| 4 | 2697 | |
| 3 | 425 | 4.5% |
| 1 | 227 | 2.4% |
| 2 | 130 | 1.4% |
purchase_missing_flag
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 6033 | |
| 0 | 3447 |
purchase_encoded
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | |
| 2 | 488 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 2 |
| 3rd row | 2 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6033 | |
| 1 | 2959 | |
| 2 | 488 | 5.1% |
recommend_missing_flag
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | 290 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9190 | |
| 1 | 290 | 3.1% |
recommend_encoded
Categorical
High correlation Imbalance
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 2 | |
|---|---|
| 1 | 408 |
| 0 | 290 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 8782 | |
| 1 | 408 | 4.3% |
| 0 | 290 | 3.1% |
helpful_missing_flag
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 5260 | |
| 0 | 4220 |
no_helpful_votes_flag
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 1 | |
|---|---|
| 0 | 435 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 9045 | |
| 0 | 435 | 4.6% |
log_helpful
Real number (ℝ)
High correlation Zeros
| Distinct | 38 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.055888228 |
| Minimum | 0 |
|---|---|
| Maximum | 4.9558271 |
| Zeros | 9045 |
| Zeros (%) | 95.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 4.9558271 |
| Range | 4.9558271 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.30461053 |
|---|---|
| Coefficient of variation (CV) | 5.4503523 |
| Kurtosis | 67.163843 |
| Mean | 0.055888228 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.3957479 |
| Sum | 529.8204 |
| Variance | 0.092787576 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 9045 | |
| 0.6931471806 | 232 | 2.4% |
| 1.098612289 | 54 | 0.6% |
| 1.386294361 | 43 | 0.5% |
| 1.609437912 | 24 | 0.3% |
| 1.945910149 | 13 | 0.1% |
| 2.079441542 | 10 | 0.1% |
| 1.791759469 | 9 | 0.1% |
| 2.197224577 | 5 | 0.1% |
| 2.564949357 | 4 | < 0.1% |
| Other values (28) | 41 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 9045 | |
| 0.6931471806 | 232 | 2.4% |
| 1.098612289 | 54 | 0.6% |
| 1.386294361 | 43 | 0.5% |
| 1.609437912 | 24 | 0.3% |
| 1.791759469 | 9 | 0.1% |
| 1.945910149 | 13 | 0.1% |
| 2.079441542 | 10 | 0.1% |
| 2.197224577 | 5 | 0.1% |
| 2.302585093 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 4.955827058 | 1 | |
| 4.574710979 | 1 | |
| 4.043051268 | 1 | |
| 3.970291914 | 1 | |
| 3.871201011 | 1 | |
| 3.850147602 | 1 | |
| 3.828641396 | 1 | |
| 3.737669618 | 1 | |
| 3.688879454 | 1 | |
| 3.583518938 | 1 |
text_length
Real number (ℝ)
High correlation
| Distinct | 126 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.577848 |
| Minimum | 0 |
|---|---|
| Maximum | 553 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 8 |
| median | 12 |
| Q3 | 18 |
| 95-th percentile | 39 |
| Maximum | 553 |
| Range | 553 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 15.110852 |
|---|---|
| Coefficient of variation (CV) | 0.9700218 |
| Kurtosis | 192.90745 |
| Mean | 15.577848 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 8.2539535 |
| Sum | 147678 |
| Variance | 228.33786 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 720 | 7.6% |
| 9 | 664 | 7.0% |
| 8 | 647 | 6.8% |
| 7 | 630 | 6.6% |
| 6 | 619 | 6.5% |
| 11 | 595 | 6.3% |
| 12 | 557 | 5.9% |
| 13 | 451 | 4.8% |
| 14 | 420 | 4.4% |
| 5 | 384 | 4.1% |
| Other values (116) | 3793 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 36 | 0.4% |
| 2 | 74 | 0.8% |
| 3 | 78 | 0.8% |
| 4 | 206 | 2.2% |
| 5 | 384 | |
| 6 | 619 | |
| 7 | 630 | |
| 8 | 647 | |
| 9 | 664 |
| Value | Count | Frequency (%) |
| 553 | 1 | |
| 229 | 1 | |
| 210 | 1 | |
| 199 | 1 | |
| 189 | 1 | |
| 180 | 1 | |
| 170 | 1 | |
| 164 | 1 | |
| 160 | 1 | |
| 159 | 1 |
is_short
Boolean
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 9.4 KiB |
| False | |
|---|---|
| True | 395 |
| Value | Count | Frequency (%) |
| False | 9085 | |
| True | 395 | 4.2% |
sentiment_polarity
Real number (ℝ)
Zeros
| Distinct | 2161 |
|---|---|
| Distinct (%) | 22.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.34275005 |
| Minimum | -1 |
|---|---|
| Maximum | 1 |
| Zeros | 562 |
| Zeros (%) | 5.9% |
| Negative | 607 |
| Negative (%) | 6.4% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -0.05 |
| Q1 | 0.175 |
| median | 0.36190476 |
| Q3 | 0.5 |
| 95-th percentile | 0.8 |
| Maximum | 1 |
| Range | 2 |
| Interquartile range (IQR) | 0.325 |
Descriptive statistics
| Standard deviation | 0.26629874 |
|---|---|
| Coefficient of variation (CV) | 0.77694733 |
| Kurtosis | 1.2366286 |
| Mean | 0.34275005 |
| Median Absolute Deviation (MAD) | 0.15791667 |
| Skewness | -0.35748081 |
| Sum | 3249.2705 |
| Variance | 0.070915017 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.5 | 619 | 6.5% |
| 0 | 562 | 5.9% |
| 0.8 | 331 | 3.5% |
| 0.4333333333 | 216 | 2.3% |
| 0.25 | 211 | 2.2% |
| 0.4 | 197 | 2.1% |
| 0.7 | 173 | 1.8% |
| 0.6 | 147 | 1.6% |
| 1 | 142 | 1.5% |
| 0.3666666667 | 130 | 1.4% |
| Other values (2151) | 6752 |
| Value | Count | Frequency (%) |
| -1 | 11 | |
| -0.85 | 1 | < 0.1% |
| -0.8166666667 | 1 | < 0.1% |
| -0.8 | 3 | < 0.1% |
| -0.75 | 3 | < 0.1% |
| -0.7142857143 | 3 | < 0.1% |
| -0.7 | 1 | < 0.1% |
| -0.7 | 8 | |
| -0.6666666667 | 1 | < 0.1% |
| -0.65 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 142 | |
| 0.9333333333 | 3 | < 0.1% |
| 0.9 | 42 | 0.4% |
| 0.875 | 1 | < 0.1% |
| 0.8666666667 | 7 | 0.1% |
| 0.86 | 1 | < 0.1% |
| 0.85 | 1 | < 0.1% |
| 0.85 | 20 | 0.2% |
| 0.8333333333 | 13 | 0.1% |
| 0.825 | 3 | < 0.1% |
review_length
Real number (ℝ)
High correlation
| Distinct | 126 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.577848 |
| Minimum | 0 |
|---|---|
| Maximum | 553 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 8 |
| median | 12 |
| Q3 | 18 |
| 95-th percentile | 39 |
| Maximum | 553 |
| Range | 553 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 15.110852 |
|---|---|
| Coefficient of variation (CV) | 0.9700218 |
| Kurtosis | 192.90745 |
| Mean | 15.577848 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 8.2539535 |
| Sum | 147678 |
| Variance | 228.33786 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 10 | 720 | 7.6% |
| 9 | 664 | 7.0% |
| 8 | 647 | 6.8% |
| 7 | 630 | 6.6% |
| 6 | 619 | 6.5% |
| 11 | 595 | 6.3% |
| 12 | 557 | 5.9% |
| 13 | 451 | 4.8% |
| 14 | 420 | 4.4% |
| 5 | 384 | 4.1% |
| Other values (116) | 3793 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 36 | 0.4% |
| 2 | 74 | 0.8% |
| 3 | 78 | 0.8% |
| 4 | 206 | 2.2% |
| 5 | 384 | |
| 6 | 619 | |
| 7 | 630 | |
| 8 | 647 | |
| 9 | 664 |
| Value | Count | Frequency (%) |
| 553 | 1 | |
| 229 | 1 | |
| 210 | 1 | |
| 199 | 1 | |
| 189 | 1 | |
| 180 | 1 | |
| 170 | 1 | |
| 164 | 1 | |
| 160 | 1 | |
| 159 | 1 |
username_dup_flag
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | 787 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8693 | |
| 1 | 787 | 8.3% |
multi_review_same_day_flag
Categorical
Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | 34 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9446 | |
| 1 | 34 | 0.4% |
multi_review_same_product_flag
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | 311 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 9169 | |
| 1 | 311 | 3.3% |
brand_encoded
Real number (ℝ)
High correlation
| Distinct | 71 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.518038 |
| Minimum | 0 |
|---|---|
| Maximum | 70 |
| Zeros | 4 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 13 |
| Q1 | 13 |
| median | 25 |
| Q3 | 52 |
| 95-th percentile | 67 |
| Maximum | 70 |
| Range | 70 |
| Interquartile range (IQR) | 39 |
Descriptive statistics
| Standard deviation | 21.395622 |
|---|---|
| Coefficient of variation (CV) | 0.63833158 |
| Kurtosis | -1.5115072 |
| Mean | 33.518038 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 0.31008717 |
| Sum | 317751 |
| Variance | 457.77265 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 3264 | |
| 52 | 847 | 8.9% |
| 17 | 757 | 8.0% |
| 55 | 669 | 7.1% |
| 64 | 668 | 7.0% |
| 41 | 644 | 6.8% |
| 67 | 452 | 4.8% |
| 44 | 367 | 3.9% |
| 69 | 333 | 3.5% |
| 24 | 213 | 2.2% |
| Other values (61) | 1266 | 13.4% |
| Value | Count | Frequency (%) |
| 0 | 4 | < 0.1% |
| 1 | 5 | 0.1% |
| 2 | 86 | |
| 3 | 34 | 0.4% |
| 4 | 73 | |
| 5 | 2 | < 0.1% |
| 6 | 9 | 0.1% |
| 7 | 2 | < 0.1% |
| 8 | 41 | |
| 9 | 25 | 0.3% |
| Value | Count | Frequency (%) |
| 70 | 59 | 0.6% |
| 69 | 333 | |
| 68 | 1 | < 0.1% |
| 67 | 452 | |
| 66 | 5 | 0.1% |
| 65 | 1 | < 0.1% |
| 64 | 668 | |
| 63 | 1 | < 0.1% |
| 62 | 1 | < 0.1% |
| 61 | 2 | < 0.1% |
category_group_encoded
Real number (ℝ)
High correlation Zeros
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.7056962 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 375 |
| Zeros (%) | 4.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 7 |
| median | 7 |
| Q3 | 7 |
| 95-th percentile | 8 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.5398115 |
|---|---|
| Coefficient of variation (CV) | 0.22962738 |
| Kurtosis | 12.62331 |
| Mean | 6.7056962 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.6588946 |
| Sum | 63570 |
| Variance | 2.3710193 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7 | 8034 | |
| 8 | 808 | 8.5% |
| 0 | 375 | 4.0% |
| 3 | 140 | 1.5% |
| 5 | 54 | 0.6% |
| 1 | 36 | 0.4% |
| 6 | 18 | 0.2% |
| 2 | 13 | 0.1% |
| 4 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 375 | 4.0% |
| 1 | 36 | 0.4% |
| 2 | 13 | 0.1% |
| 3 | 140 | 1.5% |
| 4 | 2 | < 0.1% |
| 5 | 54 | 0.6% |
| 6 | 18 | 0.2% |
| 7 | 8034 | |
| 8 | 808 | 8.5% |
| Value | Count | Frequency (%) |
| 8 | 808 | 8.5% |
| 7 | 8034 | |
| 6 | 18 | 0.2% |
| 5 | 54 | 0.6% |
| 4 | 2 | < 0.1% |
| 3 | 140 | 1.5% |
| 2 | 13 | 0.1% |
| 1 | 36 | 0.4% |
| 0 | 375 | 4.0% |
product_name_match_flag
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 5321 | |
| 1 | 4159 |
unrelated_product_flag
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 6064 | |
| 1 | 3416 |
semantic_mismatch_score
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 1 | |
|---|---|
| 0 | |
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 2 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 4375 | |
| 0 | 2924 | |
| 2 | 2181 |
repetition_score
Real number (ℝ)
High correlation
| Distinct | 236 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.12999361 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 74.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.052631579 |
| Q1 | 0.08 |
| median | 0.11111111 |
| Q3 | 0.15384615 |
| 95-th percentile | 0.25 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.073846154 |
Descriptive statistics
| Standard deviation | 0.090575472 |
|---|---|
| Coefficient of variation (CV) | 0.69676866 |
| Kurtosis | 34.92107 |
| Mean | 0.12999361 |
| Median Absolute Deviation (MAD) | 0.034188034 |
| Skewness | 4.5749584 |
| Sum | 1232.3394 |
| Variance | 0.0082039161 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.1666666667 | 723 | 7.6% |
| 0.125 | 716 | 7.6% |
| 0.1 | 709 | 7.5% |
| 0.1428571429 | 693 | 7.3% |
| 0.1111111111 | 640 | 6.8% |
| 0.09090909091 | 547 | 5.8% |
| 0.2 | 515 | 5.4% |
| 0.08333333333 | 476 | 5.0% |
| 0.07692307692 | 364 | 3.8% |
| 0.07142857143 | 329 | 3.5% |
| Other values (226) | 3768 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 0.02083333333 | 1 | |
| 0.02272727273 | 1 | |
| 0.02380952381 | 1 | |
| 0.025 | 1 | |
| 0.0253164557 | 1 | |
| 0.02597402597 | 1 | |
| 0.02631578947 | 2 | |
| 0.02678571429 | 1 | |
| 0.02712477396 | 1 |
| Value | Count | Frequency (%) |
| 1 | 37 | |
| 0.825 | 1 | < 0.1% |
| 0.75 | 2 | < 0.1% |
| 0.6666666667 | 1 | < 0.1% |
| 0.6470588235 | 1 | < 0.1% |
| 0.5 | 85 | |
| 0.4615384615 | 1 | < 0.1% |
| 0.4285714286 | 5 | 0.1% |
| 0.4 | 30 | 0.3% |
| 0.375 | 6 | 0.1% |
fake_review_label
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 463.0 KiB |
| 0 | |
|---|---|
| 1 | 680 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9480 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 8800 | |
| 1 | 680 | 7.2% |
Interactions
Correlations
| brand_encoded | category_group_encoded | fake_review_label | helpful_missing_flag | is_short | log_helpful | multi_review_same_day_flag | multi_review_same_product_flag | no_helpful_votes_flag | product_name_match_flag | purchase_encoded | purchase_missing_flag | recommend_encoded | recommend_missing_flag | repetition_score | review_length | reviews.numHelpful | reviews.rating | semantic_mismatch_score | sentiment_polarity | text_length | unrelated_product_flag | username_dup_flag | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| brand_encoded | 1.000 | -0.149 | 0.218 | 0.837 | 0.186 | 0.009 | 0.067 | 0.180 | 0.319 | 0.316 | 0.472 | 0.601 | 0.266 | 0.310 | 0.225 | -0.168 | 0.009 | 0.157 | 0.395 | -0.031 | -0.168 | 0.513 | 0.222 |
| category_group_encoded | -0.149 | 1.000 | 0.055 | 0.253 | 0.059 | 0.046 | 0.090 | 0.034 | 0.115 | 0.152 | 0.379 | 0.509 | 0.142 | 0.053 | -0.061 | 0.075 | 0.046 | 0.127 | 0.149 | -0.012 | 0.075 | 0.167 | 0.028 |
| fake_review_label | 0.218 | 0.055 | 1.000 | 0.204 | 0.162 | 0.019 | 0.157 | 0.462 | 0.010 | 0.168 | 0.099 | 0.077 | 0.020 | 0.000 | 0.276 | 0.007 | 0.000 | 0.043 | 0.189 | 0.091 | 0.007 | 0.105 | 0.842 |
| helpful_missing_flag | 0.837 | 0.253 | 0.204 | 1.000 | 0.188 | 0.243 | 0.028 | 0.045 | 0.244 | 0.300 | 0.417 | 0.308 | 0.138 | 0.136 | 0.394 | 0.026 | 0.054 | 0.111 | 0.525 | 0.083 | 0.026 | 0.465 | 0.154 |
| is_short | 0.186 | 0.059 | 0.162 | 0.188 | 1.000 | 0.186 | 0.000 | 0.000 | 0.160 | 0.129 | 0.242 | 0.100 | 0.006 | 0.000 | 0.720 | 0.022 | 0.000 | 0.041 | 0.088 | 0.174 | 0.022 | 0.023 | 0.046 |
| log_helpful | 0.009 | 0.046 | 0.019 | 0.243 | 0.186 | 1.000 | 0.000 | 0.000 | 1.000 | 0.038 | 0.177 | 0.187 | 0.038 | 0.009 | 0.011 | 0.006 | 1.000 | 0.034 | 0.037 | -0.034 | 0.006 | 0.019 | 0.000 |
| multi_review_same_day_flag | 0.067 | 0.090 | 0.157 | 0.028 | 0.000 | 0.000 | 1.000 | 0.321 | 0.000 | 0.000 | 0.047 | 0.019 | 0.070 | 0.065 | 0.000 | 0.076 | 0.000 | 0.017 | 0.000 | 0.038 | 0.076 | 0.000 | 0.196 |
| multi_review_same_product_flag | 0.180 | 0.034 | 0.462 | 0.045 | 0.000 | 0.000 | 0.321 | 1.000 | 0.003 | 0.013 | 0.052 | 0.053 | 0.037 | 0.036 | 0.000 | 0.000 | 0.000 | 0.020 | 0.015 | 0.041 | 0.000 | 0.002 | 0.611 |
| no_helpful_votes_flag | 0.319 | 0.115 | 0.010 | 0.244 | 0.160 | 1.000 | 0.000 | 0.003 | 1.000 | 0.030 | 0.244 | 0.185 | 0.038 | 0.000 | 0.147 | 0.148 | 0.238 | 0.064 | 0.051 | 0.045 | 0.148 | 0.022 | 0.000 |
| product_name_match_flag | 0.316 | 0.152 | 0.168 | 0.300 | 0.129 | 0.038 | 0.000 | 0.013 | 0.030 | 1.000 | 0.187 | 0.181 | 0.089 | 0.087 | 0.319 | 0.145 | 0.028 | 0.092 | 0.787 | 0.140 | 0.145 | 0.116 | 0.074 |
| purchase_encoded | 0.472 | 0.379 | 0.099 | 0.417 | 0.242 | 0.177 | 0.047 | 0.052 | 0.244 | 0.187 | 1.000 | 1.000 | 0.107 | 0.084 | 0.226 | 0.097 | 0.057 | 0.169 | 0.195 | 0.084 | 0.097 | 0.179 | 0.083 |
| purchase_missing_flag | 0.601 | 0.509 | 0.077 | 0.308 | 0.100 | 0.187 | 0.019 | 0.053 | 0.185 | 0.181 | 1.000 | 1.000 | 0.146 | 0.073 | 0.210 | 0.127 | 0.065 | 0.208 | 0.266 | 0.083 | 0.127 | 0.170 | 0.075 |
| recommend_encoded | 0.266 | 0.142 | 0.020 | 0.138 | 0.006 | 0.038 | 0.070 | 0.037 | 0.038 | 0.089 | 0.107 | 0.146 | 1.000 | 1.000 | 0.066 | 0.131 | 0.000 | 0.543 | 0.053 | 0.159 | 0.131 | 0.080 | 0.026 |
| recommend_missing_flag | 0.310 | 0.053 | 0.000 | 0.136 | 0.000 | 0.009 | 0.065 | 0.036 | 0.000 | 0.087 | 0.084 | 0.073 | 1.000 | 1.000 | 0.093 | 0.169 | 0.000 | 0.177 | 0.074 | 0.092 | 0.169 | 0.066 | 0.010 |
| repetition_score | 0.225 | -0.061 | 0.276 | 0.394 | 0.720 | 0.011 | 0.000 | 0.000 | 0.147 | 0.319 | 0.226 | 0.210 | 0.066 | 0.093 | 1.000 | -0.741 | 0.011 | 0.042 | 0.221 | 0.216 | -0.741 | 0.148 | 0.096 |
| review_length | -0.168 | 0.075 | 0.007 | 0.026 | 0.022 | 0.006 | 0.076 | 0.000 | 0.148 | 0.145 | 0.097 | 0.127 | 0.131 | 0.169 | -0.741 | 1.000 | 0.006 | 0.045 | 0.046 | -0.252 | 1.000 | 0.080 | 0.000 |
| reviews.numHelpful | 0.009 | 0.046 | 0.000 | 0.054 | 0.000 | 1.000 | 0.000 | 0.000 | 0.238 | 0.028 | 0.057 | 0.065 | 0.000 | 0.000 | 0.011 | 0.006 | 1.000 | 0.000 | 0.012 | -0.034 | 0.006 | 0.000 | 0.000 |
| reviews.rating | 0.157 | 0.127 | 0.043 | 0.111 | 0.041 | 0.034 | 0.017 | 0.020 | 0.064 | 0.092 | 0.169 | 0.208 | 0.543 | 0.177 | 0.042 | 0.045 | 0.000 | 1.000 | 0.046 | 0.123 | 0.045 | 0.057 | 0.047 |
| semantic_mismatch_score | 0.395 | 0.149 | 0.189 | 0.525 | 0.088 | 0.037 | 0.000 | 0.015 | 0.051 | 0.787 | 0.195 | 0.266 | 0.053 | 0.074 | 0.221 | 0.046 | 0.012 | 0.046 | 1.000 | 0.085 | 0.046 | 0.771 | 0.095 |
| sentiment_polarity | -0.031 | -0.012 | 0.091 | 0.083 | 0.174 | -0.034 | 0.038 | 0.041 | 0.045 | 0.140 | 0.084 | 0.083 | 0.159 | 0.092 | 0.216 | -0.252 | -0.034 | 0.123 | 0.085 | 1.000 | -0.252 | 0.050 | 0.024 |
| text_length | -0.168 | 0.075 | 0.007 | 0.026 | 0.022 | 0.006 | 0.076 | 0.000 | 0.148 | 0.145 | 0.097 | 0.127 | 0.131 | 0.169 | -0.741 | 1.000 | 0.006 | 0.045 | 0.046 | -0.252 | 1.000 | 0.080 | 0.000 |
| unrelated_product_flag | 0.513 | 0.167 | 0.105 | 0.465 | 0.023 | 0.019 | 0.000 | 0.002 | 0.022 | 0.116 | 0.179 | 0.170 | 0.080 | 0.066 | 0.148 | 0.080 | 0.000 | 0.057 | 0.771 | 0.050 | 0.080 | 1.000 | 0.064 |
| username_dup_flag | 0.222 | 0.028 | 0.842 | 0.154 | 0.046 | 0.000 | 0.196 | 0.611 | 0.000 | 0.074 | 0.083 | 0.075 | 0.026 | 0.010 | 0.096 | 0.000 | 0.000 | 0.047 | 0.095 | 0.024 | 0.000 | 0.064 | 1.000 |
Missing values
Sample
| id | dateAdded | dateUpdated | reviews.date | reviews.dateAdded | reviews.dateSeen | reviews.numHelpful | reviews.rating | purchase_missing_flag | purchase_encoded | recommend_missing_flag | recommend_encoded | helpful_missing_flag | no_helpful_votes_flag | log_helpful | text_length | is_short | sentiment_polarity | review_length | username_dup_flag | multi_review_same_day_flag | multi_review_same_product_flag | brand_encoded | category_group_encoded | product_name_match_flag | unrelated_product_flag | semantic_mismatch_score | repetition_score | fake_review_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AV13O1A8GV-KLJ3akUyj | 2017-07-25 00:52:42+00:00 | 2018-02-05 08:36:58+00:00 | 2012-11-30 06:21:45+00:00 | 2018-02-04 07:28:12+00:00 | 2018-01-15 04:45:00+00:00 | 0.0 | 5 | 1 | 0 | 1 | 0 | 0 | 1 | 0.0 | 19 | False | 0.133333 | 19 | 1 | 0 | 0 | 65 | 1 | 0 | 0 | 1 | 0.052632 | 1 |
| 1 | AV14LG0R-jtxr-f38QfS | 2017-07-25 05:16:03+00:00 | 2018-02-05 11:27:45+00:00 | 2017-07-09 00:00:00+00:00 | 2017-09-23 02:53:06+00:00 | 2017-09-16 09:45:00+00:00 | 0.0 | 5 | 0 | 2 | 1 | 0 | 1 | 1 | 0.0 | 6 | False | 0.700000 | 6 | 1 | 1 | 1 | 33 | 2 | 0 | 0 | 1 | 0.166667 | 1 |
| 2 | AV14LG0R-jtxr-f38QfS | 2017-07-25 05:16:03+00:00 | 2018-02-05 11:27:45+00:00 | 2017-07-09 00:00:00+00:00 | 2017-09-06 04:49:31+00:00 | 2017-08-23 10:37:00+00:00 | 0.0 | 5 | 0 | 2 | 1 | 0 | 1 | 1 | 0.0 | 2 | True | 0.700000 | 2 | 1 | 1 | 1 | 33 | 2 | 0 | 0 | 1 | 0.500000 | 1 |
| 3 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-01-06 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 55 | False | -0.007175 | 55 | 1 | 0 | 0 | 30 | 0 | 0 | 1 | 2 | 0.036364 | 1 |
| 4 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-12-21 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 14 | False | 0.000000 | 14 | 0 | 0 | 0 | 30 | 0 | 1 | 0 | 0 | 0.214286 | 0 |
| 5 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-04-20 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 21 | False | -0.012500 | 21 | 1 | 0 | 0 | 30 | 0 | 1 | 1 | 1 | 0.095238 | 1 |
| 6 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-02-08 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 18 | False | -0.094643 | 18 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 1 | 0.055556 | 0 |
| 7 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-02-21 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 18 | False | 0.170000 | 18 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 1 | 0.111111 | 0 |
| 8 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-03-28 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 16 | False | -0.137500 | 16 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 1 | 0.062500 | 0 |
| 9 | AV16khLE-jtxr-f38VFn | 2017-07-25 16:26:19+00:00 | 2018-02-05 11:25:51+00:00 | 2016-03-21 00:00:00+00:00 | 2017-09-11 17:13:57+00:00 | 2017-09-04 12:18:00+00:00 | 0.0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0.0 | 17 | False | 0.071429 | 17 | 0 | 0 | 0 | 30 | 0 | 0 | 0 | 1 | 0.117647 | 0 |
| id | dateAdded | dateUpdated | reviews.date | reviews.dateAdded | reviews.dateSeen | reviews.numHelpful | reviews.rating | purchase_missing_flag | purchase_encoded | recommend_missing_flag | recommend_encoded | helpful_missing_flag | no_helpful_votes_flag | log_helpful | text_length | is_short | sentiment_polarity | review_length | username_dup_flag | multi_review_same_day_flag | multi_review_same_product_flag | brand_encoded | category_group_encoded | product_name_match_flag | unrelated_product_flag | semantic_mismatch_score | repetition_score | fake_review_label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9470 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-22 00:00:00+00:00 | 2017-08-24 00:14:12+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 16 | False | 0.500000 | 16 | 0 | 0 | 0 | 13 | 7 | 1 | 1 | 1 | 0.062500 | 0 |
| 9471 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-22 00:00:00+00:00 | 2017-08-24 00:14:12+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 16 | False | 0.388889 | 16 | 0 | 0 | 0 | 13 | 7 | 1 | 0 | 0 | 0.125000 | 0 |
| 9472 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2012-01-26 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 11 | False | 0.400000 | 11 | 0 | 0 | 0 | 13 | 7 | 0 | 0 | 1 | 0.090909 | 0 |
| 9473 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2012-02-11 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 12 | False | 0.050000 | 12 | 0 | 0 | 0 | 13 | 7 | 0 | 0 | 1 | 0.166667 | 0 |
| 9474 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-03 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 18 | False | 0.366667 | 18 | 0 | 0 | 0 | 13 | 7 | 1 | 0 | 0 | 0.055556 | 0 |
| 9475 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-30 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 13 | False | 0.400000 | 13 | 0 | 0 | 0 | 13 | 7 | 0 | 0 | 1 | 0.076923 | 0 |
| 9476 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-27 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 17 | False | 0.133333 | 17 | 0 | 0 | 0 | 13 | 7 | 1 | 0 | 0 | 0.058824 | 0 |
| 9477 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2014-12-06 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 13 | False | 0.266667 | 13 | 0 | 0 | 0 | 13 | 7 | 1 | 0 | 0 | 0.076923 | 0 |
| 9478 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2015-01-06 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 16 | False | 0.500000 | 16 | 0 | 0 | 0 | 13 | 7 | 1 | 0 | 0 | 0.062500 | 0 |
| 9479 | AVpf3VOfilAPnD_xjpun | 2015-09-11 18:17:13+00:00 | 2018-02-05 08:35:02+00:00 | 2015-01-17 00:00:00+00:00 | 2017-08-24 00:14:13+00:00 | 2017-08-16 12:56:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 14 | False | 0.000000 | 14 | 0 | 0 | 0 | 13 | 7 | 0 | 0 | 1 | 0.142857 | 0 |
Duplicate rows
Most frequently occurring
| id | dateAdded | dateUpdated | reviews.date | reviews.dateAdded | reviews.dateSeen | reviews.numHelpful | reviews.rating | purchase_missing_flag | purchase_encoded | recommend_missing_flag | recommend_encoded | helpful_missing_flag | no_helpful_votes_flag | log_helpful | text_length | is_short | sentiment_polarity | review_length | username_dup_flag | multi_review_same_day_flag | multi_review_same_product_flag | brand_encoded | category_group_encoded | product_name_match_flag | unrelated_product_flag | semantic_mismatch_score | repetition_score | fake_review_label | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AV1YGDqsGV-KLJ3adc-O | 2017-07-18 23:46:09+00:00 | 2018-02-05 08:34:58+00:00 | 2014-10-30 00:00:00+00:00 | 2017-08-05 07:43:50+00:00 | 2017-07-19 23:58:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 9 | False | 0.650000 | 9 | 0 | 0 | 0 | 69 | 0 | 0 | 0 | 1 | 0.111111 | 0 | 2 |
| 1 | AV1YGDqsGV-KLJ3adc-O | 2017-07-18 23:46:09+00:00 | 2018-02-05 08:34:58+00:00 | 2014-10-30 00:00:00+00:00 | 2017-09-25 16:58:52+00:00 | 2017-09-18 03:50:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 9 | False | 0.000000 | 9 | 0 | 0 | 0 | 69 | 0 | 0 | 0 | 1 | 0.111111 | 0 | 2 |
| 2 | AV1YGDqsGV-KLJ3adc-O | 2017-07-18 23:46:09+00:00 | 2018-02-05 08:34:58+00:00 | 2014-10-30 00:00:00+00:00 | 2017-09-25 16:58:52+00:00 | 2017-09-18 03:50:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 12 | False | 0.366667 | 12 | 0 | 0 | 0 | 69 | 0 | 0 | 0 | 1 | 0.083333 | 0 | 2 |
| 3 | AV1YmDL9vKc47QAVgr7_ | 2017-07-19 02:05:55+00:00 | 2018-02-05 11:27:11+00:00 | 2016-10-05 00:00:00+00:00 | 2017-09-11 02:56:19+00:00 | 2017-09-04 04:19:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 14 | False | 0.465000 | 14 | 0 | 0 | 0 | 2 | 3 | 1 | 0 | 0 | 0.071429 | 0 | 2 |
| 4 | AV1l8zRZvKc47QAVhnAv | 2017-07-21 16:20:23+00:00 | 2018-02-05 11:28:34+00:00 | 2015-05-26 00:00:00+00:00 | 2017-09-24 08:19:42+00:00 | 2017-09-03 19:47:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 11 | False | 0.000000 | 11 | 0 | 0 | 0 | 41 | 8 | 0 | 0 | 1 | 0.090909 | 0 | 2 |
| 5 | AV1l8zRZvKc47QAVhnAv | 2017-07-21 16:20:23+00:00 | 2018-02-05 11:28:34+00:00 | 2015-06-01 00:00:00+00:00 | 2017-09-24 08:19:42+00:00 | 2017-09-03 19:47:00+00:00 | 0.0 | 5 | 0 | 1 | 0 | 2 | 1 | 1 | 0.0 | 15 | False | 0.250000 | 15 | 0 | 0 | 0 | 41 | 8 | 0 | 0 | 1 | 0.066667 | 0 | 2 |
| 6 | AVpe41TqilAPnD_xQH3d | 2017-01-15 18:09:31+00:00 | 2018-02-05 08:36:37+00:00 | 2016-12-14 00:00:00+00:00 | 2017-09-24 02:17:19+00:00 | 2017-09-21 07:50:00+00:00 | 0.0 | 5 | 1 | 0 | 0 | 2 | 0 | 1 | 0.0 | 6 | False | 0.000000 | 6 | 0 | 0 | 0 | 17 | 7 | 1 | 1 | 1 | 0.166667 | 0 | 2 |
| 7 | AVpe6FpaLJeJML43yBuP | 2015-11-13 04:18:48+00:00 | 2018-02-05 08:36:40+00:00 | 2014-09-25 00:00:00+00:00 | 2017-09-27 10:39:26+00:00 | 2017-09-08 02:54:00+00:00 | 0.0 | 5 | 1 | 0 | 0 | 2 | 0 | 1 | 0.0 | 7 | False | 0.700000 | 7 | 1 | 1 | 1 | 67 | 7 | 0 | 0 | 1 | 0.142857 | 1 | 2 |
| 8 | AVpf0eb2LJeJML43EVSt | 2015-11-15 09:05:44+00:00 | 2018-02-05 08:37:24+00:00 | 2015-02-21 00:00:00+00:00 | 2017-09-20 23:03:37+00:00 | 2017-09-02 06:16:00+00:00 | 0.0 | 5 | 1 | 0 | 0 | 2 | 0 | 1 | 0.0 | 8 | False | -0.250000 | 8 | 0 | 0 | 0 | 52 | 7 | 1 | 1 | 1 | 0.125000 | 0 | 2 |
| 9 | AVpf0eb2LJeJML43EVSt | 2015-11-15 09:05:44+00:00 | 2018-02-05 08:37:24+00:00 | 2017-01-20 00:00:00+00:00 | 2017-09-20 23:03:37+00:00 | 2017-09-02 06:15:00+00:00 | 0.0 | 5 | 1 | 0 | 0 | 2 | 0 | 1 | 0.0 | 7 | False | 0.000000 | 7 | 0 | 0 | 0 | 52 | 7 | 0 | 1 | 2 | 0.142857 | 0 | 2 |